Predictive analysis of lower limb fractures in the orthopedic complex operative unit using artificial intelligence: the case study of AOU Ruggi

The length of stay (LOS) in hospital is one of the main parameters for evaluating the management of a health facility, of its departments in relation to the different specializations. Healthcare costs are in fact closely linked to this parameter as well as the profit margin. In the orthopedic field, the provision of this parameter is increasingly complex and of fundamental importance in order to be able to evaluate the planning of resources, the waiting times for any scheduled interventions and the management of the department and related surgical interventions. The purpose of this work is to predict and evaluate the LOS value using machine learning methods and applying multiple linear regression, starting from clinical data of patients hospitalized with lower limb fractures. The data were collected at the "San Giovanni di Dio e Ruggi d’Aragona" hospital in Salerno (Italy).

www.nature.com/scientificreports/ In the field of orthopedic surgery, machine learning algorithms and predictive models have been successfully applied which have proved to be optimal for the improvement of different health processes. Artificial neural network models, then compared with logistic regression models, were used to predict one-year mortality in elderly patients with intertrochanteric femoral fractures 31 . Machine learning (ML) was used for the analysis of healing times of lower limb fractures of children aged 0-12 years, using Random Forest and Self Organizing Feature Maps methods 32 . Neural networks and Random Forest were useful in selecting features for the evaluation of locomotor system degradation 33 . A multivariate logistic regression model was used to determine whether distal fractures of both upper and lower limbs occur in higher percentage in diabetic patients taking thiazolidinedione than in those not consuming it 34 .
Our aim concerns the analysis of variables influencing the LOS of orthopedic patients; through the analysis of the medical records of the "San Giovanni di Dio e Ruggi d' Aragona" University Hospital of Salerno with particular attention to patients who in the years 2019 and 2020 were treated for having suffered fractures of the tibia and lower limbs. The collected data were used to model and predict overall hospital length of stay by following a two-way approach (Fig. 1): a multiple linear regression analysis and an ML classification analysis, performed to predict LOS clustered in weeks. Therefore, we designed different ML models (Random Forest, Decision Tree, Gradient Boosted Trees, Logistic Regression, Naïve Bayes and Support Vector Machine) trained on these data for making decisions. Our aim is to compute the prediction, of the LOS. Then, we discuss the potential of the model obtained as a tool for using hospital management. The present research work is both an extension and an improvement of a previous paper that the same authors presented at a conference 62 . An extension because the dataset considered is much larger both in terms of number of records and variables considered. An improvement because we have moved from classifying the length of hospital stay (LOS) in weeks to predicting it in a precise manner using regression techniques.

Materials and methods
The dataset has been built by extracting information about 123 patients operated and hospitalized among 2019-2020 from the QuaniSDO informative system in use of the "San Giovanni di Dio e Ruggi d' Aragona". The information collected for patients undergoing CS in the two hospitals considered is biographic (i.e., Gender or age), hospital (i.e., discharge or admission date) and clinical (i.e., comorbidities and complications during surgery) ones.
The different machine learning algorithms has been implemented by using Knime Analytics Platform to deal with the LOS task.
Furthermore, the dataset was expanded in order to extend the elaboration considering more patients and a major number of elements. The extracted date have been extracted by the "QuaniSDO" information system with the following inclusion criteria: "All patients with a principal diagnosis of lower limb fracture from 2011 to 2020 in both of the aforementioned departments".
In this manner, 706 hospital discharge forms were extracted with the following information for each patient: • Year of discharge (2011-2020); • Gender (Male/Female); • Age;  • Date of admission, discharge and procedure from which LOS and preoperative-LOS were obtained.
In order to create the multiple linear regression (MLR) model capable of predicting LOS, the following information was considered: • Gender (Male/Female); • Age; • Department, encoded according to hospital rules with "3612" for Orthotraumatology and "3641" for Orthopaedics and Traumatology; • Comorbidities (yes/no). Cardiovascular disease, hypertension, diabetes and obesity were considered.
Machine learning algorithms. The ML models can be divided into supervised, that learns from historical data to classify the sample in the inference phase, and unsupervised, that aims to find some hidden pattern to cluster all the samples. In this section, we discuss about different machine learning models used for our analysis, that fall in the first category, whose learning phase (also called training phase) has been made on a set of entire samples (usually 80%) whilst the remain part is used for evaluating the designed model (test/inference phase). We used the Decision (DT) being an algorithm that bases prediction on the construction of decision trees. In each node, a condition is verified and, according to the value assumed by one of the features, a path is determined through one of the branches. This is done until a value is assigned to the target variable. The Random Forest (RF) and Gradient Boosted Trees (GBT) rely on the tree data structure, but use a set of it in order to improve the performance of the single, used by DT. In this way, it is possible to build a strong predictive model, www.nature.com/scientificreports/ although overfitting problems can be generated. RF and GBT. Naïve Bayes (NB) learns from historical information by using the Naïve Bayes theorem. Finally, Support Vector Machine (SVM) is based on the construction of a hyperplane that separates the different classes identified in the training phase. Therefore, it has a more complex structure of DT. It is widely used on non-linear and small data sets. In addition, the 3 algorithms that had the highest accuracy in the conference paper were used ed in particular 75% Training and 25% testing. In this case, LOS was divided into weeks: in which we can note that: • y corresponds the LOS value; • β 0 is intercept value; • xi are the independent variables; • β i are the estimated regression coefficients of respective variables; • ε is the regression error.

Results
In the previous paper [Colella et al. 62 ] the algorithms implemented were those in Table 1.
The best performance was obtained with NB algorithm with an accuracy equals to 92%. Following the results obtained with the extended dataset are presented. In particular, in Table 2 is shown a distribution of the different characteristics for the 706 accesses of the dataset considering different type of parameters. Table 3 shows the characteristics of the regression model. An Adjusted R square of 0.805 is a good value and shows that the model is able to predict LOS adequately. The standard error is 1.902 while the Durbin-Watson test between 1.5 and 2.5 indicates that the values of the residuals are independent, a fundamental assumption for the model to be developed.
A Fisher's test was performed to assess the joint significance of the coefficients. The p-value is less than 0.05 so the model has explanatory power ( Table 4). Table 5 shows, for each independent variable, the coefficients obtained, the t-test and the p-value obtained. The test is significant for p-value < 0.05 for which age, gender, department, type of hospitalization and pre-los significantly influence LOS.
As can be seen from Table 6, we can note that VIF < 10 and tolerance > 0.2 so it can be said that there is no multicollinearity in the data, another fundamental assumption for the model to be developed. Figure 2 shows how the residual values are normally distributed as points are almost all on the diagonal. As it is possible to see in the Fig. 3 all Cook distance values are much less than 1 so there are no outliers affecting the model.
In the end, having considered more variables and a longer time frame, the 3 ML algorithms were also used which in the conference paper were better to understand how the accuracy varied ( Table 7). The results of ML analysis are presented in terms of accuracy, precision, sensitivity, specificity and F-measure.      www.nature.com/scientificreports/ Table 8 shows the RF confusion matrix, in which we can note that 134 out of 177 predictions were correct, namely those on the diagonal of the matrix.
Finally, a global surrogate Random Forest was used, which is a Random Forest model trained to approximate the predictions of the original model. The Random Forest was trained on pre-processed input data in a standard way with the optimized parameters "Tree Depth," "Number of Patterns," and "Minimum Size of Child Nodes." The surrogate model was successfully trained with an accuracy of 0.986 with respect to the class of interest predicted by the original model "LOS week: 1." Feature importance is calculated by counting how many times it was selected for a split and at what level (level) among all available (candidate) features in the random forest trees. A higher value indicates greater feature importance (Fig. 4).
Preoperative-LOS was obviously the most significant feature, followed by age, procedure, and principal diagnosis.

Discussion and conclusion
In this paper, our aim is to investigate the LOS prediction task, whose aim is to jointly diminish the hospital resource and costs in order to support the decision making process of managers; in fact, improving the LOS prediction allows to enhance bed estimation to focus the hospital resources mainly on the subjects affecting by several disease, also decreasing their occupancies. The LOS prediction task can be further useful for several applications (i.e., reimbursement or accounting 63,64 . For this reason, the MLR and several ML models have been designed and appropriately trained to predict LOS of subjects under lower limb parameters. Our experimental evaluation made over a large cohort of patients shows that the RF achieves highest results in accuracy (75.7%) in predicting LOS. So taking into account a larger dataset with more accesses but also with more variables, the ML algorithms returned lower accuracy than the previous work which had an accuracy of 88% 62 . The MLR model with an R-square of 0.80 proves to be a valid decision support for this type of patient. This task further can support the hospital resources in their decisionmaking process. This type of "double analysis" has already been performed to predict LOS of patients who have undergone femur fracture [54.] In fact, the first analysis is performed with MLR that predicts LOS in a punctual way and the second analysis instead uses different ML algorithms classifying LOS in weeks (3 groups). As in the aforementioned study, the ML results are good with accuracy above 75%. As for MLR in our case the model is superior with a much higher R-square.
(4) Specificity = n • of true negatives n • of true negatives + n • of false positives  www.nature.com/scientificreports/ As it was possible to see from the results, the development of the elaboration created starting from the additional dataset takes into consideration a greater number of variables than the starting one as well as a greater number of accesses considered. The results show a significant influence of age, gender, department, type of hospitalization and pre-los for the increasing of LOS (Table 5).
A comparison of the significance of the regression coefficients (Table 5) and the importance ranking of the characteristics (Fig. 4) obtained by applying the machine learning models reveals that the most influential factors, according to the RF algorithm, are preoperative LOS, age, and procedure type, which only partially overlaps with the significance of the regression coefficients. In fact, preoperative LOS and age proved to be significant predictors in both multiple regression and machine learning models, while procedure type assumed greater significance as a predictor of LOS in ML analysis than in regression analysis. Finally, ward, type of hospitalization, and sex were significant for the regression analysis but not very significant for the ML algorithms. These results would recommend that the interpretation of predictive models of the healthcare process should be done with caution and in consideration of the value and effect of the predictors chosen and used in the models. In fact, comparing the relevance of the predictors in the regression and classification models examined is an essential part of assessing the validity of the results and should be the guide for obtaining reasonable and interpretable results when dealing with predictive algorithms in the healthcare context.
The additional variables takes into account the related parameters allows to enhance the performance of the proposed approach over a cohort of subjects under lower limb fractures although it can improve the complexity of the entire system.

Data availability
The datasets generated and/or analyzed during the current study are not publicly available for privacy reasons but could be made available from the corresponding author on reasonable request.